On Optical Character Recognition of Arabic Text

نویسندگان

  • Abdelmalek Zidouri
  • Muhammad Sarfraz
چکیده

Although, optical character recognition has made tremendous achievements in the area of desktop publishing, yet a huge amount of work is required to be done. Unlike Roman like languages, there are various languages possessing a large number of fonts and/or having complicated shapes. Arabic language is one of those languages, which is somewhat complicated in its construction. Although a reasonable amount of work has been reported so far for Arabic language but still a good amount of work is needed to be developed. In addition, many other languages also need considerable attention for automatic generation in their recognition. Efficient, robust, and error free methodologies are required to develop systems for such languages so that the recent hardware technologies, to display and print, can be utilized. This work is devoted to one way of addressing the problem of recognition of the Arabic alphabet. We give a brief survey of the state of the art in Arabic Character Recognition and different methods and approaches to this problem. We show that recognition can be achieved by simple matching to prebuilt prototypes of all the Arabic Character set. This free segmentation approach proved to be efficient for the recognition of one font of the Arabic language. We deal with Arabic as a well-structured language and base our prototype description on a method called “Minimum Covering Run Expression”. We also show that our database of prototypes is easily extendable to allow for multifont recognition of Arabic as a basis for a full Arabic OCR system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word-level recognition of multifont Arabic text using a feature vector matching approach

Many text recognition systems recognize text imagery at the character level and assemble words from the recognized characters. An alternative approach is to recognize text imagery at the word level, without analyzing individual characters. This approach avoids the problem of individual character segmentation, and can overcome local errors in character recognition. A word-level recognition syste...

متن کامل

Off-line Handwritten Arabic Character Recognition: A Survey

The automatic recognition of text on scanned images has several applications such as automatic postal mail sorting and searching in large volume of documents. Although Arabic handwritten text recognition has been addressed by many researchers, it remains a challenging task due to several factors. This paper presents an overview of off-line handwritten Arabic character recognition and summarizes...

متن کامل

A Survey on Arabic Character Recognition

Off-line recognition of text play a significant role in several application such as the automatic sorting of postal mail or editing old documents. It is the ability of the computer to distinguish characters and words. Automatic off-line recognition of text can be divided into the recognition of printed and handwritten characters. Off-line Arabic handwriting recognition still faces great challen...

متن کامل

A Finite State Model for Urdu Nastalique Optical Character Recognition

Finite state technology is being used since long to model NLP (Natural Language Processing) applications specially it has very successfully applied to machine translation and speech recognition systems. Character recognition in cursive scripts or handwritten Latin script also have attracted researchers’ attention and some research is also done in this area. Optical character recognition is the ...

متن کامل

A segmentation-free approach to Arabic and Urdu OCR

In this paper, we present a generic Optical Character Recognition system for Arabic script languages called Nabocr. Nabocr uses OCR approaches specific for Arabic script recognition. Performing recognition on Arabic script text is relatively more difficult than Latin text due to the nature of Arabic script, which is cursive and context sensitive. Moreover, Arabic script has different writing st...

متن کامل

An Arabic optical character recognition system using recognition-based segmentation

Optical character recognition (OCR) systems improve human}machine interaction and are widely used in many areas. The recognition of cursive scripts is a di$cult task as their segmentation su!ers from serious problems. This paper proposes an Arabic OCR system, which uses a recognition-based segmentation technique to overcome the classical segmentation problems. A newly developed Arabic word segm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002